home *** CD-ROM | disk | FTP | other *** search
Text File | 1993-04-12 | 42.2 KB | 1,136 lines |
- ************************************************************
- * *
- * Tutorial To Linux Driver Writing -- Character Devices *
- * *
- * or, *
- * *
- * Now That I'm Wacky, Let Me Do Something (I) *
- * *
- * Last Revision: Apr 11, 1993 *
- * *
- ************************************************************
-
- This document (C) 1993 Robert Baruch. This document may be freely
- copied as long as the entire title, copyright, this notice, and all of
- the introduction are included along with it. Suggestions, criticisms,
- and comments to baruch@nynexst.com. This document, nor the work
- performed by Robert Baruch using Linux, nor the results of said work
- are connected in any way to any of the Nynex companies. This product
- 0% organic as defined by California Statute 4Z//7&A. No artificial
- coloring or flavoring.
-
- ========================
- Introduction
- ========================
-
- There is a companion guide to this Tutorial, the Guide to Linux Driver
- Writing -- Character Devices This Guide should serve as a reference to
- both beginning and advanced driver writers, and should be used in
- conjunction with this Tutorial.
-
- -=-=-=-=-=-=-
-
- Some words of thanks:
-
- Many thanks to:
-
- Donald J. Becker (becker@metropolis.super.org)
- Don "May the Source be With You!" Holzworth (donh@gcx1.ssd.csd.harris.com)
- Michael Johnson (johnsonm@stolaf.edu)
- Karl Heinz Kremer (khk@raster.kodak.com)
- Pat Mackinlay (mackinla@cs.curtin.edu.au)
- ...others too numerous to mention...
- All the driver writers!
-
- ...and of course, Linus "That's LIN-uhks" Torvalds and all the guys who helped
- develop Linux into a BLOODY KICKIN' O/S!
-
- -=-=-=-=-=-=-
-
- ...and now a word of warning:
-
- Messing about with drivers is messing with the kernel. Drivers are run
- at the kernel level, and as such are not subject to scheduling. Further,
- drivers have access to various kernel structures. Before you actually
- write a driver, be *damned* sure of what you are doing, lest you end
- up having to re-format your harddrive and re-install Linux!
-
- The information in this Tutorial is as up-to-date as I could make it. It also
- has no stamp of approval whatsoever by any of the designers of the kernel.
- I am not responsible for damage caused to anything as a result of using this
- Guide.
-
- ========================
- End of Introduction
- ========================
-
-
- CHAPTRE THE FIRSTE : How did *they* get the device driver in the kernel?
- ------------------
-
- You have to realize that device drivers really are part of the kernel. The
- kernel can hook in to the functions in your device driver if you tell it
- the addresses of some standard functions. These standard functions are
- detailed in the Guide.
-
- As a part of the kernel, the code of the device driver must be compiled in
- *with* the kernel. That is, you must alter some Makefiles to compile your
- driver and to get it archived into the chr_drv.a library, or you can
- archive it yourself and link it in to the kernel at a later compile stage.
-
- The first step, before you even write a single line of driver code, is to
- make sure you know how to recompile the kernel. Then go ahead and actually
- do it, to be sure you (and your system) are sane. Of course, you need
- the sources to the kernel. If you have the SLS distribution of Linux, you
- already have the sources in /linux. If you don't have the sources, you
- can get it at one of these fine ftp sites near you:
-
- tsx-11.mit.edu:/pub/linux
- sunsite.unc.edu:/pub/Linux
-
- Briefly, here's how to compile the kernel (at least this is how it's done
- in the SLS release):
-
- Go to /linux (or wherever the source for Linux is)
- You will see a directory which looks a lot like this:
-
- -rw-r--r-- 1 baruch 17982 Nov 10 07:54 COPYING
- -rw-r--r-- 1 baruch 1444 Jan 13 15:24 Configure
- -rw-r--r-- 1 baruch 6934 Feb 22 13:31 Makefile
- -rw-r--r-- 1 baruch 4078 Dec 12 06:45 README
- drwxrwxr-x 2 baruch 512 Feb 22 13:34 boot
- -rw-r--r-- 1 baruch 1724 Feb 9 15:07 config.in
- drwxrwxr-x 8 baruch 512 Feb 22 13:34 fs
- drwxrwxr-x 4 baruch 512 Dec 1 19:40 include
- drwxrwxr-x 2 baruch 512 Feb 22 13:34 init
- drwxrwxr-x 5 baruch 512 Feb 9 15:11 kernel
- drwxrwxr-x 2 baruch 512 Feb 9 15:11 lib
- -rwxr-xr-x 1 baruch 166 Nov 10 07:54 makever.sh
- drwxrwxr-x 2 baruch 512 Feb 22 13:34 mm
- drwxrwxr-x 3 baruch 512 Feb 9 15:11 net
- drwxrwxr-x 2 baruch 512 Feb 22 13:34 tools
- drwxrwxr-x 2 baruch 512 Feb 22 13:34 zBoot
-
- The README file should contain instructions, but here's how anyway:
-
- Log in as root.
-
- make clean (Do this only once. Otherwise you'll have to sit around
- for 45 minutes or so while the whole thing recompiles)
- make config (Answer the questions -- usually needed only the first time)
- make dep (Makes dependencies)
- make (makes the kernel)
-
- You should end up with an Image file. This is the kernel. Put it where
- you like (LILO users should take it from there). To make a bootable disk,
- just pop a DOS formatted disk in drive A, and do:
-
- make disk
-
- ------------------------------------------------------------------------
-
- CHAPTER TWO: The simplest driver you've ever seen.
- ------------
-
- Now, the directory you're interested in is <src>/kernel/chr_drv. This is
- where all the character device drivers are kept. Go to that directory.
- Open up a new file, and call it testdata.c. Here is what you should
- put in it:
-
- ========================================
- File Listing 1: testdata.c
- ========================================
-
- #include <linux/kernel.h>
- #include <linux/sched.h>
- #include <linux/tty.h>
- #include <linux/signal.h>
- #include <linux/errno.h>
-
- #include <asm/io.h>
- #include <asm/segment.h>
- #include <asm/system.h>
- #include <asm/irq.h>
-
- unsigned long test_init(unsigned long kmem_start)
- {
- printk("Test Data Generator installed.\n");
- return kmem_start;
- }
-
- ========================================
-
- The include files are all there for convenience. You may need them later.
- All this driver does is upon initialization, display a message.
-
- Now, to get this driver into the kernel, you need to do several things.
- The first two things do in the chr_drv directory:
-
- I. Get the kernel to call your init function on bootup. To do this,
- edit the mem.c file, and go to the very end to the function
- chr_drv_init. It looks something like this:
-
- long chr_dev_init(long mem_start, long mem_end)
- {
- if (register_chrdev(1,"mem",&memory_fops))
- printk("unable to get major 1 for memory devs\n");
- mem_start = tty_init(mem_start);
- mem_start = lp_init(mem_start);
- mem_start = mouse_init(mem_start);
- mem_start = soundcard_init(mem_start);
- return mem_start;
- }
-
- You need to add your test_init function to the code. Put it right
- before the return:
-
- mem_start = test_init(mem_start);
-
- Save the file.
-
- II. Edit the Makefile to compile testdata.c. Edit the Makefile, and add
- testdata.o to the OBJS list. This will cause the make utility to
- compile testdata.c into an object file, and then add it to the
- chr_drv.a library archive.
-
- Save the file.
-
- The next step is to re-compile the kernel. Go to the <src> directory,
- and do a make from the top as described in the first chapter. There is
- no point in doing a "make clean" or "make config". If all goes well, the
- make should proceed down to chr_drv, and compile your testdata.c file.
- If there are warnings or errors, do a ctrl-C to break out of the make,
- and fix the problem.
-
- Once you are left with an Image file, put the Image file where LILO
- wants it, or use "make disk" to make a bootable disk. It's a good
- idea to save your old Image file (or save the disk it was on).
-
- Now reboot. When Linux comes up again, you should see your message
- printed on bootup after all the character devices' messages, before
- any of the block device messages. If the message came up, have a soda.
- Jump up and down a little. (Well, first jump, _then_ have the soda).
-
- If it didn't work, go back and find out what you did wrong. Are you
- sure you recompiled the kernel? Did it recompile with testdata.c? Did
- you reboot using the new kernel? Are you sure? Are you root? Maybe
- your kernel is bad or old. I have used 0.99pl6, with the new libc.so.4.3.2
- shared library successfully, and I am currently using 0.99pl8 with
- libc.so.4.3.3.
-
-
- ------------------------------------------------------------------------
-
- CHAPTER THREE: A device driver that actually does something useful.
- -------------
-
- This example is taken from the _Writing UNIX Device Drivers_ book by
- George Pajari, published by Addison Wesley. It can usually be found
- in a Barnes and Noble bookstore, or any large bookstore which has
- a nice section on UNIX. The ISBN is 0-201-52374-4, and it was published
- in 1992. This book is highly recommended for the device driver writer.
-
- This device driver will actually be read from. You can open and close
- it (which really won't do much), but the biggest thing it will do is
- allow you to read from it. This driver won't access any external hardware,
- and so it is called a "pseudo device driver". That is, it really doesn't
- drive any device.
-
- Have your Guide handy? OK, now alter your testdata.c file so that it
- looks like this:
-
- ========================================
- File Listing 2: testdata.c
- ========================================
-
- #include <linux/kernel.h>
- #include <linux/sched.h>
- #include <linux/tty.h>
- #include <linux/signal.h>
- #include <linux/errno.h>
-
- #include <asm/io.h>
- #include <asm/segment.h>
- #include <asm/system.h>
- #include <asm/irq.h>
-
- static char test_data[]="Linux is really funky!\n";
-
- static int test_read(struct inode * inode, struct file * file,
- char * buffer, int count)
- {
- int offset;
-
- printk("Test Data Generator, reading %d bytes\n",count);
- if (count<=0) return -EINVAL;
- for (offset=0; offset<count; offset++)
- put_fs_byte(test_data[offset % (sizeof(test_data)-1)], buffer+offset);
- return offset;
- }
-
- static int test_open(struct inode *inode, struct file *file)
- {
- printk("Test Data Generator opened.\n");
- return 0;
- }
-
- static void test_release(struct inode *inode, struct file *file)
- {
- printk("Test Data Generator released.\n");
- }
-
- struct file_operations test_fops = {
- NULL, /* test_seek */
- test_read, /* test_read */
- NULL, /* test_write */
- NULL, /* test_readdir */
- NULL, /* test_select */
- NULL, /* test_ioctl */
- NULL, /* test_mmap */
- test_open, /* test_open */
- test_release /* test_release */
- };
-
- unsigned long test_init(unsigned long kmem_start)
- {
- printk("Test Data Generator installed.\n");
- if (register_chrdev(21,"test",&test_fops));
- printk("Test Data Generator error: Cannot register to major device 21!\n");
- return kmem_start;
- }
-
- ========================================
-
- OK, let's go over this. Look first at the test_init function. Notice
- the new function -- register_chrdev. This registers the character device
- with the kernel as using major device number 21. All devices (except for
- the really simple one in the last chapter) use major device numbers to
- be accessed. The kernel has an internal table of devices and their
- associated device functions which is indexed by major device number.
-
- The device numbers go from 0 to MAX_CHRDEV-1. MAX_CHRDEV is defined in
- linux/fs.h, and is currently set at 32. In general, you want to stay away
- from devices 0-15 because those are reserved for the "usual" devices.
- Currently, these usual devices (according to the FAQ) are as follows:
-
- ---Excerpt from FAQ begins---
-
- QUESTION: What are the device minor/major numbers?
-
- The Linux Device List
- maintained by rick@ee.uwm.edu (Rick Miller, Linux Device Registrar)
- February 17, 1993
-
- Many thanks to richard@stat.tamu.edu, Jim Winstead Jr., and many others.
-
- Majors:
- 0. Unnamed . (unknown) .... for proc-fs, NFS clients, etc.
- 1. Memory .. (character) .. ram, mem, kmem, null, port, zero
- 2. Floppy .. (block) ...... fd[0-1]<[dhDH]{360,720,1200,1440}>
- 3. Hard Disk (block) ...... hd[a-b]<[0-8]>
- 4. Tty ..... (character) .. {p,t}ty<{S,[p-s][0-f]}><#>
- 5. tty ..... (character) .. tty, cua[0-63]
- 6. Lp ...... (character) .. lp[0-2] or par[0-2]
- 7. Tape .... (block) ...... t[0-?] (reserved for Non-SCSI tape drives)
- 8. Scsi Disk (block) ...... sd[a-h]<[0-8]>
- 9. Scsi Tape (block) ...... <n>rmt[0-1]
- 10. Mouse ... (character) .. bm, psaux (mouse)
- 11. CD-ROM .. (block) ...... scd[0-1]
- 12. QIC-tape? (character) .. rmt{8,16}, tape<{-d,-reset}>
- 13. XT-disk . (block) ...... xd[a-b]<[0-8]>
- 14. Audio ... (character) .. audio, dsp, midi, mixer, sequencer
-
- ---Excerpt from FAQ ends---
-
- The FAQ goes on to break down the major devices by minor numbers. Each major
- device can be broken down into at most 256 minor devices (0-255). The
- device driver can determine which minor it is supposed to operate on. More
- on that later.
-
- In any case, I've chosen major 21 for experimentation purposes. By the way,
- the name of the driver (here it's "test") is not important. The kernel does
- not do anything with it. [It would be nice if it would. Then you could
- interrogate the kernel and find out what drivers are installed!]
-
- register_chrdrv also takes in a pointer to a file_operations structure. This
- structure tells the kernel which function to call for which kernel operation.
- The details of this structure is given in the Guide. For now, what is
- important is that we are telling the kernel to call test_read for read
- operations, test_open for open operations, and test_close for release
- operations.
-
- If a driver has already taken major 21, register_chrdrv will return -EBUSY.
- Here, all we do is print a message saying that 21 is already taken.
-
- Now, the test_open and test_release functions just print out things to
- the console. They are really there for debugging purposes, so that you
- can see when things happen.
-
- The meat of the driver is the test_read function. The first thing it does
- is print out how many bytes were requested. Then it puts that many bytes
- into user space. Remember that the driver is executing at the kernel level,
- and the user space will be differnet from kernel space. We have to do
- some kind of translation to put the data which is in kernel space into
- the buffer which is in user space. We use here the put_fs_byte function.
-
- The loop puts the string into the buffer, going back to the beginning of
- the string if necessary. Once the loop is finished, we just return the
- actual number of bytes read. The actual number may be different from the
- requested number. For example, you may be reading from the driver some kind
- of message which has a fixed size. You may want to code the driver so that
- if you attempt to read more than the message size, you will get only the
- message size, and no more. Here, we just give the process however many
- bytes it wants.
-
- Now, let's get this driver into the kernel. But first what we'll do is
- create a special file which can be opened, read, and closed. Operations on
- this special file will activate your driver code.
-
- The special files are normally stored in the /dev directory. Do this:
-
- mknod /dev/testdata c 21 0
- chmod 0666 /dev/testdata
-
- This makes a special character (c) file called testdata, and gives it major
- 21, minor 0. The chmod makes sure that everyone can read and write the
- device.
-
- Now recompile the kernel, and reboot. Once again, make sure you fix any
- warnings or errors in your testdata.c compilation.
-
- Now, go to the /tmp directory (or whereever you want), and write this
- program:
-
- ========================================
- File Listing 3: data.c
- ========================================
-
- #include <stdio.h>
- #include <sys/types.h>
- #include <unistd.h>
- #include <fcntl.h>
-
- void main(void)
- {
- int fd;
- char buff[128];
-
- fd = open("/dev/testdata",O_RDWR);
- printf("/dev/testdata opened, fd=%d\n",fd);
- if (fd<=0) exit(0);
- printf("sizeof(buff)=%d\n",sizeof(buff));
- printf("Read returns %d\n",read(fd,buff,sizeof(buff)));
- buff[127]=0;
- printf("buff=\n'%s'\n",buff);
- close(fd);
- }
-
- ========================================
-
- Compile it using gcc. Run it. If it said "Linux is really funky!" lots
- of times, pat yourself on the back (or whereever you want) for a job
- well done. If it didn't, check the output, and see where you went wrong.
- It could just be that you have a bad or old kernel.
-
- The last line may be partial, since you're only printing out 127 characters.
-
- ++++++++++++++++++++++
- EXPERIMENT 1
- ++++++++++++++++++++++
-
- Use mknod to make another special file, this one with minor 1. Call it
- something like /dev/testdata2. Change the device driver so that in the
- read call, it finds out which minor is being read from. Use this:
-
- int minor = MINOR(inode->i_rdev);
-
- Print out the minor number, and depending on which minor it is, read
- from a different message string. Test your driver with code similar to
- data.c.
-
- ++++++++++++++++++++++
-
- ------------------------------------------------------------------------
-
- CHAPTER FOUR: You've learned to read, now you're gonna learn to write.
- ------------
-
- Now that you're reading strings, you may want to write strings and read them
- back. We'll go through two versions of this -- one that uses static memory,
- and one that dynamically allocates the memory.
-
- Keeping your current driver, all you need to do is add a write function to
- it, not forgetting to put that write function into the file_operations
- structure of the driver.
-
- Add this section of code to your driver above the file_operations structure
- declaration:
-
- ========================================
- File Listing 4 (partial): testdata.c
- ========================================
-
- static char test_data[128]="\0";
- static int test_data_size=0;
-
- static int test_write(struct inode * inode, struct file * file,
- char * buffer, int count)
- {
- printk("Write %d bytes\n",count);
- if (count>127) return -ENOMEM;
- if ((!test_data_size) || (count<=0)) return -EINVAL;
- memcpy_fromfs((void *)test_data, (void *)buffer, (unsigned long)count);
- test_data[127]=0; /* NUL-terminate the string if necessary */
- test_data_size = count;
- return count;
- }
-
- ========================================
-
- Also, alter the test_read function so that instead of using sizeof(test_data)
- as the size of the test_data string, it uses test_data_size.
-
- In the test_write function, I have decided to prevent the acceptance of
- strings which are too big to fit (with a NUL-terminator) into the test_data
- area, rather than just writing only what fits. In this case, if the offered
- string is too long, I return ENOMEM. The write function in the user's
- process will return <0, and errno will be set to ENOMEM.
-
- Also note that I have used the memcpy_fromfs function, which is real
- convenient -- much more convenient than looping a put_fs_byte.
-
- Compile this driver, and test it by modifying data.c to write some data,
- then read it back.
-
- ++++++++++++++++++++++
- EXPERIMENT 2
- ++++++++++++++++++++++
-
- Re-write the driver so that it can have two different strings for the two
- minor devices as in experiment 1.
-
- ++++++++++++++++++++++
-
- Now that we can write data to the driver, it would be nice if we could
- dynamically allocate memory to store a string in. We will use kmalloc to
- do this. (Why is discussed later)
-
- One thing which must be realized with kmalloc -- it can only allocate a
- maximum of one Linux page (4096 bytes). If you want more, you will have
- to create a linked list.
-
- Change your driver so that instead of listing 4, you have this:
-
- ========================================
- File Listing 5 (partial): testdata.c
- ========================================
-
- static char *test_data=NULL;
- static int test_data_size=0;
-
- static int test_write(struct inode * inode, struct file * file,
- char * buffer, int count)
- {
- printk("Write %d bytes\n",count);
- if (count>4095) return -ENOMEM;
- if (test_data!=NULL) kfree_s((void *)test_data, test_data_size);
- test_data_size = 0;
- test_data = (char *)kmalloc((unsigned int)count, GFP_KERNEL);
- if (test_data==NULL) return -ENOMEM;
- memcpy_fromfs((void *)test_data, (void *)buffer, (unsigned long)count);
- test_data[count]=0; /* NUL-terminate the string if necessary */
- test_data_size = count;
- return count;
- }
-
- ========================================
-
- Here, instead of statically allocating memory for the string, we dynamically
- allocate it using kmalloc. Note first, that if we had already allocated
- a string, we free it first by using kfree_s. This is faster than using
- kfree, because kfree would have to search for the size of the object allocated.
- Here we know what the size was, so we can use kfree_s. kmalloc vs. malloc
- is discussed below.
-
- Next, note that we use the GFP_KERNEL priority in the kmalloc. This causes
- the process to go to sleep if there is no memory available, and the process
- will wake up again when there is memory to spare. In general, the process
- will sleep until a page of memory is swapped out to disk.
-
- In the event of catastrophic memory non-availability, kmalloc will return
- NULL, and we should handle that case. Unfortunately here, we have already
- freed the previous string -- although that could be changed easily by
- kmallocing, then kfreeing.
-
- The rest of the code reads as in listing 4.
-
- When we get into the section on interrupt handling, we will discuss the
- use of GFP_ATOMIC as a kmalloc priority.
-
- A brief excursion into kmalloc vs. malloc:
-
- The malloc() call allocates memory in user space, which is fine if that's what
- you want. Here, we want to have the driver store information so that *any*
- process can use it, and so we have to allocate memory in the kernel. That
- means, kmalloc(). Further, there is a maximum of 4096 bytes which can be
- allocated in any one call of kmalloc. This means that you cannot be guaranteed
- to get contiguous space of over 4096 bytes. You will have to use a linked
- list of kmalloced buffers.
-
- Alternatively, you can fool with the init section of the driver, and reserve
- contiguous space for yourself on init (but then it may as well be statically
- allocated).
-
- ------------------------------------------------------------------------
-
- CHAPTER FIVE: For my next trick, I...fall....a...sleep (SNNXXXX!!)
- ------------
-
- The thing which really saves multitasking operating systems is that
- many process sleep when waiting for events to occur. If this were not
- true, processes would always be burning cycles, and there would really
- be no big difference between running your processes at the same time,
- or one after the other.
-
- But when a process sleeps, other processes get to use the CPU. In general,
- processes sleep when an event they are waiting for has not yet happened. The
- exception to this is processes which are designed to do work when nothing
- is happening. For example, you might have a process sitting around using
- cycles to calculate pi out to a zillion digits. That kind of background
- process should have its priority set real low so that it isn't executed
- often when other (presumably more important) processes have work to do.
-
- Since processes sleep when waiting for events, and said events are usually
- handled by drivers, drivers must cause the processes which called them to
- sleep if not ready. This is the idea behind the select() call, which will
- be dealt with in a later chapter.
-
- To illustrate sleeping and waking processes, we will alter our driver from
- listing 2 by adding a new write function and changing the read function
- around as follows:
-
- ========================================
- File Listing 6 (partial): testdata.c
- ========================================
-
- static char test_data[]="Linux is really funky!\n";
- static int wakeups = 0;
- static struct wait_queue *wait_queue = NULL;
-
- static int test_write(struct inode * inode, struct file * file,
- char * buffer, int count)
- {
- int i;
-
- printk("Write %d bytes\n",count);
- wake_up_interruptible(&wait_queue);
- printk("Woke %d processes.\n",wakeups);
- wakeups = 0;
- return count;
- }
-
- static int test_read(struct inode * inode, struct file * file,
- char * buffer, int count)
- {
- int offset;
-
- printk("Test Data Generator, reading %d bytes\n",count);
-
- printk("Process going to sleep\n");
- wakeups++;
- interruptible_sleep_on(&wait_queue);
- printk("Process has woken up!\n");
-
- for (offset=0; offset<count; offset++)
- put_fs_byte(test_data[offset % (sizeof(test_data)-1)], buffer+offset);
- return offset;
- }
-
- ========================================
-
- Don't forget to put the test_write function in the file_operations struct!
- But don't compile this driver just yet! Read on...
-
- The operation of this driver is as follows: On a read, put the process
- to sleep. On a write, wake up all those processes which have gone to
- sleep in this driver. This will allow the processes to complete the read.
-
- There are two new variables here, wakeups and wait_queue. The wait_queue is
- a circular queue of processes which are sleeping. It is FIFO, so that the
- process woken up is the first process which went to sleep.
-
- The kernel handles the queue for us; all we need to do is supply a pointer
- to the queue and initialize it to NULL (i.e., the queue is empty).
-
- We'll use the wakeups variable to tell us how many processes are taken off the
- wait_queue (i.e., woken up) -- which is the number of processes which have
- already gone to sleep. So each time a process is slept on, we increment
- wakeups. When a write request comes in, we wake up wakeups processes and reset
- wakeups to zero.
-
- Simple, yes? Now we get into the sticky part.
-
- In the Guide, you see that you can choose two ways of sleeping -- interruptible
- or not. Interruptible sleeps can be interrupted (i.e., the process is woken
- up) by signals (such as SIGUSR) and hardware interrupts. Non-interruptible
- sleeps can only be interrupted by hardware interrupts. Not even a kill -9
- will wake up a non-interruptible process which is sleeping! Suppose you have a
- signal handler in your process which will react to signal 30 (SIGUSR).
- That is, you can do kill -30 <pid>. What happens?
-
- When the scheduler gets around to checking the signalled process for
- runnability, it sees that there is a signal pending. This allows the process
- to continue to run where it left off, with a twist: when the process leaves
- kernel mode (the driver call) and enters user mode, the signal handler is
- called (if there is one). Once the signal handler function exits, one of
- two things can happen:
-
- (1) If the original system call exited with -ERESTARTNOINTR,
- then the process will continue as if it calls the system call again
- with the same arguments.
- (2) If the original system call did not exit with -ERESTARTNOINTR,
- but with -ERESTARTNOHAND or -ERESTARTSYS, then the process will
- continue exitting from the system call with -1, errno -EINTR.
- (3) If the original system call did not exit with -ERSTARTNOINTR,
- -ERESTARTNOHAND, or -ERESTARTSYS, then the process will continue,
- exitting from the system call with whatever was returned.
-
- You can see most of this (if you can read mutilated 80386 assembly) in
- <src>/kernel/sys_call.S and <src>/kernel/signal.c. Although signal handling
- has been considerably revamped for 0.99pl8, the basic sequence of operations
- is intact across patch levels. -ERESTARTNOHAND is new in 0.99pl8.
-
- This is important -- the driver call should not be completed except for
- cleanup, since the kernel will return an error for you or redo the system
- call.
-
- When the process continues to run before calling the signal handler, it picks
- up where it left off -- in the interruptible_sleep_on function. This function
- takes the process off the wait_queue automatically (which is nice). But then
- wakeups is not updated (which is not so nice). In that case, when a
- subsequent write comes in, the number of sleeping processes reported will
- be wrong!
-
- [pulpit-pounding mode on]
-
- Although for this driver ignoring this is not such a big deal, it is sloppy
- programming for a driver. Driver code must be so perfect that it operates
- like a well-oiled machine, with no slip-ups. One error -- one bit of code
- that gets out of sync -- and you can at least annoy users and make them throw
- up their hands in frustration, and at worst panic the kernel and make users
- throw your code away in frustration! Also, there is nothing worse than
- spending time debugging an application when the bug is in the driver, or
- trying to code around a known driver flaw.
-
- [pulpit-pounding mode off]
-
- So how do we solve this out-of-sync problem?
-
- Fact: ignoring interrupts, all processes are atomic when they are in the
- kernel. That is, unless a process performs an operation which can sleep (like
- the call to kmalloc we visited above), or a hardware interrupt comes in, the
- flow of execution goes from entering the kernel to leaving the kernel, with no
- time taken out to run anything else. This does not mean that the code in user
- space gets to continue to run. If the process leaves the system call and
- is not eligible to run, other processes may run and then later on the system
- call appears to have returned to the process. More on that later.
-
- That fact is good to know. It means that as long as we are sure upon entering
- the test_write call that wakeups contains the correct number of sleeping
- processes, test_write will work 100%. That is, unless a hardware interrupt
- comes in which causes the driver to execute an interrupt handler, we are safe,
- but here we have no such handler, and so we can ignore that for now. We will
- deal with interrupts in a later chapter.
-
- So we know that write doesn't really have to be changed. It's really the
- read that we're concerned about. What we need to do is after we get out
- of interruptible_sleep_on() we see if we were genuinely woken up through
- a wakeup call, or if we were signalled. If we were signalled, then
- we know that the write call wasn't the cause of the wakeup, and so we should
- really decrement wakeups.
-
- Now for some loose ends. Remember that upon signalling, the kernel only
- flags the signal for the process, and sets the process to a runnable state.
- That does not mean that it can run immediately. Another process may get
- to run first, and that process may very well run the driver's write code,
- waking up all processes. Of course, we can consider the signalled process
- to be still asleep when it gets the signal, because it has not yet run its
- signal handler. So when that other process gets to run the write code, the
- number of sleeping processes is indeed correct, and wakeups is set to 0.
-
- But now, when the signalled process is run again, the read code will attempt
- to decrement wakeups, making it -1! The next write will display the wrong
- number of sleeping processes!
-
- One thing saves us -- the fact that we can detect in the read code that the
- write code was executed, simply because wakeups is 0. Remember that wakeups
- is incremented before the sleep, so it is guaranteed to be greater than 0 if
- the write code was not executed before waking up because of a signal.
-
- So if the write code was executed, it really does not make sense to decrement
- wakeups, so we just say that only if wakeups is non-zero do we decrement.
-
- To implement all this, add this code after the sleep:
-
- ========================================
- File Listing 7 (partial): testdata.c
- ========================================
-
- if (current->signal & ~current->blocked) /* signalled? */
- {
- printk("Process signalled.\n");
- if (wakeups) wakeups--;
- return -ERESTARTNOINTR; /* Will restart call automagically */
- }
-
- ========================================
-
-
-
- Now that you've got that straightened out, let's add some more confusion to
- the mix. Suppose you're in the driver call, doing nice things, and then
- all of a sudden a nasty timer interrupt (task switch possibility) comes in.
- What now? Will there be a task switch? No. A RUNNING task in the kernel
- cannot be switched out, otherwise all hell would break loose. Whew!
- I'm glad we don't have to pay attention to that!
-
- Well, now that we've gone through all the possible ways signals can make
- your insides twist, you can code the driver. Remember to put listing 7 into
- listing 6!
-
- Here's how we're going to test this driver. Several processes will call
- read (and sleep). When they wake up, they're going to say that they were
- woken up (as opposed to printing out what they just read -- we already know
- that works). One process will do a write to wake the other processes up.
- This is the trigger process. Here is the code for the two types of processes:
-
- ========================================
- File Listing 8: data.c
- ========================================
-
- #include <stdio.h>
- #include <sys/types.h>
- #include <unistd.h>
- #include <fcntl.h>
- #include <errno.h>
- #include <signal.h>
-
- /* The reader process */
-
- void signal_handler(int x)
- {
- printf("Called signal handler\n");
- signal(SIGUSR1, signal_handler); /* Reset signal handler */
- }
-
- void main(void)
- {
- int fd;
- char buff[128];
- int rtn;
-
- signal(SIGUSR1, signal_handler); /* Setup signal handler */
-
- fd = open("/dev/testdata",O_RDWR);
- printf("/dev/testdata opened, fd=%d\n",fd);
- if (fd<=0) exit(0);
-
- rtn = read(fd,buff,sizeof(buff));
- printf("Read returns %d\n",rtn);
- if (rtn<0)
- {
- perror("read");
- exit(1);
- }
- printf("Process woken up!\n");
- close(fd);
- }
-
- ========================================
-
- ========================================
- File Listing 9: trigger.c
- ========================================
-
- #include <stdio.h>
- #include <sys/types.h>
- #include <unistd.h>
- #include <fcntl.h>
- #include <errno.h>
- #include <signal.h>
-
- /* The writer process */
-
- void main(int argc, char **argv)
- {
- int fd;
- char buff[128];
- int rtn;
-
- fd = open("/dev/testdata",O_RDWR);
- printf("/dev/testdata opened, fd=%d\n",fd);
- if (fd<=0) exit(0);
-
- if (argc>1)
- {
- kill(atoi(argv[1]),SIGUSR1);
- exit(0);
- }
- rtn = write(fd,buff,sizeof(buff));
- if (rtn<0)
- {
- perror("write");
- exit(1);
- }
- close(fd);
- }
-
- ========================================
-
- Compile these programs using gcc. Now run two or three of the data processes:
-
- data &
-
- The last thing each of these processes should print is
-
- Process going to sleep.
-
- because all of these processes are asleep. Now run the trigger program:
-
- trigger
-
- This should wake up all the other processes, which should say,
-
- Process woken up!
-
- Had the read function returned an error (like EINTR), they would have said
-
- read: <error text>
-
- Now, let's test to see if the signal detection and restart mechanism works.
- Run a single data process in the background via "data &". Remember it's
- pid. Now, run the trigger process with that pid as an argument:
-
- trigger <pid>
-
- This will signal <pid> instead of waking it up via write. The driver should
- say,
-
- Process signalled.
- Called signal handler
-
- but the process should not wake up, since we restarted the call. Only a
- write will stop the call.
-
- ++++++++++++++++++++++
- EXPERIMENT 3
- ++++++++++++++++++++++
-
- Re-write the driver so that instead of always restarting the call, it returns
- with EINTR on signal when the read call's count is a special value or values
- (say anything less than 1000). Test to see if the read call returns EINTR when
- the trigger program signals the reading process.
-
- ++++++++++++++++++++++
-
- ------------------------------------------------------------------------
-
- CHAPTER SIX: I want this, that, that...no, THIS, and that. Or, selects!
- -----------
-
- The select call is one of the most useful calls created for interfacing to
- drivers. Without it, or a function like it, if you wanted to check a
- driver for readiness, you would have to poll it regularly. Worse, you
- would not be able to check multiple drivers for readiness at the same time!
-
- But enough of this. You have select, so rejoice and be happy.
-
- As already implied by the first paragraph, the select system call allows
- a process to check multiple drivers for readiness. For example, suppose
- you wanted the process to sit around and wait for one of two file
- descriptors to be ready for reading. Usually, if a descriptor is not ready
- for reading and you read it, it will put your process to sleep (or "block").
- But you can only read one file descriptor at a time, and here you want to
- essentially block on _two_ fd's.
-
- In that case, you use the select call. The syntax of select was already
- explained in the Guide, so let's go about implementing a select function in
- our driver.
-
- Add the following code to the driver, and put the test_select function in
- the fops structure:
-
- ========================================
- File Listing 10 (partial): testdata.c
- ========================================
-
- static int test_select(struct inode *inode, struct file *file,
- int sel_type, select_table *wait)
- {
- printk("Driver entering select.\n");
- if (sel_type==SEL_IN) /* ready for read? */
- {
- if (wakeups) /* Any process is sleeping in here */
- {
- select_wait(&wait_queue, wait);
- printk("Driver not ready\n");
- return 0; /* Not ready yet */
- }
- return 1; /* Ready */
- }
- return 1; /* Always ready for writes and exceptions */
- }
-
- ========================================
-
- Here's what this function does. When a process issues a select call with
- this driver as one of the fd's to select on, the kernel will call
- test_select with sel_type being SEL_IN. If wakeups is non-zero (that is,
- processes have read without a process writing) then we will say that the
- driver is not ready for reading. In this case, select_wait will add the
- process to the wait_queue and immediately return. The return of 0 indicates
- that the driver is not ready for the operation.
-
- For any other type of operation (or if there are no processes sleeping in
- read) we say the driver is ready (return 1).
-
- The only thing that must be remembered is that we are using the same
- wait_queue structure for processes sleeping in read and processes sleeping
- in select. This means that writing to the driver will wake up both types of
- processes. If desired, a different wait_queue could be used, and the
- appropriate wake up code would have to be written.
-
- Compile this new code into the kernel. We will test this driver by writing
- a new type of process which will call the select system call. Here is the
- new process' code:
-
- ========================================
- File Listing 11: sel.c
- ========================================
-
- #include <stdio.h> /* Doesn't hurt, can only help! */
- #include <fcntl.h>
- #include <sys/time.h> /* For FD_* and select */
-
- void main(void)
- {
- int fd;
- int rtn;
- fd_set read_fds;
-
- fd = open("/dev/testdata", O_RDWR);
- printf("/dev/testdata opened, fd=%d\n",fd);
- if (fd<=0) exit(0);
- printf("Entering select...\n");
- FD_ZERO(&read_fds);
- FD_SET(fd,&read_fds);
- rtn = select(&read_fds, NULL, NULL, NULL);
- if (rtn<0)
- {
- perror("select");
- exit(0);
- }
- printf("Select returns %d\n",rtn);
- }
-
- ========================================
-
- When the kernel is re-loaded, the first test we will perform is to see
- whether the select call returns immediately given that no processes are
- sleeping in read. Just run sel -- no need to run it in the background.
- You should see something like:
-
- Entering select...
- Driver entering select.
- Select returns 1
-
- This is as it should be -- select has determined that one file descriptor is
- ready for reading.
-
- Our next test is to see whether select sleeps properly. Run this:
-
- data &
- sel &
- trip
-
- When sel is run, you should see:
-
- Entering select...
- Driver entering select.
- Read not ready
- Driver entering select.
- Read not ready
-
- The select call in the kernel calls the test_select function again once if
- the first time the driver is not ready. However, the process is only added
- to the wait queue once -- the first time.
-
- Once the trip program is run, you should see:
-
- Process has woken up!
- Read returns 1024
- Driver entering select.
- Select returns 1
-
- That is, the data process woke up due to the write, as did the sel process.
- Note that the test_select function is called once again when the sel process
- is woken up. This is also a consequence of the kernel design, and is
- nothing to worry about. Those who are interested in the inner workings of
- the select call should look in the file <src>/fs/select.c.
-
- A word about signals and select. Since the select call in the driver does
- not return any error code -- just 0 or non-0 -- there is no way to decide
- whether the select call should be restarted or not. Select will return -1,
- errno EINTR if interrupted by a signal.
-
- ------------------------------------------------------------------------
-
- CHAPTER SEVEN: This next chap -- oh, hello! -- this next chapter is about
- ------------- interrupts.
-
- This chapter will be one of the most difficult chapters to go through as a
- tutorial, since some means of generating interrupts must be used to test
- things with. Furthermore, the interrupt must be one which is currently
- unused by the system, and one must be willing to mess around with a hardware
- device which is connected to the IRQ.
-
- I will start out with something more controlled than external interrupts --
- internal, or software, interrupts.
-
- Why internal interrupts? There really is not such a big difference between
- internal and external interrupts. Certainly an IRQ is generated by a
- hardware device, but the hardware IRQ results in a software interrupt. I
- will discuss the required changes for dealing with hardware rather than
- software interrupts later in this chapter.
-
- Note: The following paragraphs deal with 80386/80486 specific stuff. Those
- who are not really interested in the "why" of Linux interrupts may skip
- ahead!
-
- To be able to use interrupts, we must first understand how Linux handles
- interrupts. Interrupts most often require a transfer of execution control
- from one code segment to another, and this may be accomplished in two ways.
- The first is by specifying the descriptor of the other executable segment,
- and the second is by a "gate".
-
- In Linux, three functions are used to initialize gates: set_intr_gate,
- set_trap_gate, and set_system_gate.
-
- set_intr_gate sets up a 32-bit interrupt gate with descriptor privilege
- level (DPL) 0 (the most privileged level).
-
- set_trap_gate sets up a 32-bit task gate with DPL 0.
-
- set_system_gate sets up a 32-bit task gate with DPL 3.
-
- Each of these setups enter the gate into the interrupt descriptor table
- (IDT) so that when an INT n instruction is performed, the gate in the IDT
- corresponding to n is executed.
-
- THIS ENDS 80386/80486 DISCUSSION.
-
- The three Linux calls allow us to install an interrupt handler for any
- interrupt from 0x00 to 0xFF. We will use set_intr_gate to install an
- interrupt handler into interrupt 0x90.
-
-